Log File Formats
Comments and questions: Summary@Summary.Net
Summary works with logs from the following servers (and many others):
WebSTAR - version 1.2.1 or higher
WebTen - see Apache comments
QuidProQuo
Rumpus - some limitations
Boulevard
MacHTTP
NCSA httpd
Microsoft Personal Web Server for Macintosh
Apple Share IP - version 5.0 or higher
Netscape FastTrack and Enterprise
Apache - when set to produce NCSA Common or NCSA Combined format logs
Microsoft IIS - in all log modes. With Microsoft format optionally adding:
FlashLog from Maximized Software
(http://www.maximized.com/products/flashstats/flashlog.htm)
WebTrends (www.webtrends.com) when cookie support is on
Summary will automatically recognize the following log formats:
WebSTAR
NCSA Common (CLF)
NCSA Combined (sometimes called NCSA Extended)
Microsoft IIS Version 3 and Version 4
W3C Extended Log Format (ExLF)
Netscape
MacHTTP
User Defined
Many servers are configurable to produce log files in several different formats, and some formats have a long list of fields that can be included individually. There is often a trade off between large log files with lots of information and smaller log files that don't tell you everything but actually fit on your hard disk. The following comments will help you choose between the different options, and tell you how to best configure your server for Summary.
Comments on specific log formats
WebSTAR format
The WebSTAR server's log format is highly configurable. Summary supports WebSTAR using Common Log Format (CLF) logs and with Extended Log Format (ExLF) logs but we recommend using WebSTAR Log Format (WLF).
When configuring your log format there are three issues to keep in mind. The more information you put in the log file, the more Summary will be able to report to you. At the same time, the more information you put in the larger the log files will become, eventually filling your hard disk. Finally, some of the log tokens have become obsolete and have been replaced by newer tokens. The older tokens will still work but are not recommended due to various limitations.
Summary requires the following tokens in WebSTAR format logs:
- DATE
- Date of request.
- URL
- The requested item. Same as CS-URI and CS-URI-STEM.
The following fields are very highly recommended:
- TIME
- Time of request. Required for the Hourly, Time of Day, and Gaps in Service reports.
- HOSTNAME
- Name or IP address of the requesting computer. You can use C-IP if you always leave DNS lookups off in WebSTAR but it is slightly larger. You can use C-IP and C-DNS together in that order to keep all available information even though Summary won't take advantage of it and your log will be larger. CS-IP and CS-HOST together in that order work but they are not recommended. Required for the Top Level Domain, Domain, Host, Visits per Host, Hits per Visit, Pages per Visit, Bytes per Visit, Source, Destination, Path, and Reloads reports and the Unique Hosts and Visits columns in various reports.
- BYTES
- Bytes sent, same as BYTES_SENT. Required for the Bytes per Visit, Requests by Bytes, Peak Hours, Peak Days, by File Type, Transfer Size, Transfer Time, and Connection Speed reports and all of the Bytes related columns in various reports.
- SC-STATUS
- Result code. This provides just slightly more information than RESULT (which is also acceptable). CS-STATUS will work but it is not recommended. Required for the Bad Links and Failed Requests reports and the Errors column in various reports.
- REFERER
- Site and page that referred them to us. Slightly shorter than CS(REFERER) which also works. This field will increase the size of log files substantially. Required for the Domains, Referrers, Search Words, Search Phrases, Full Referrers, New Referrers, Local Referrers, Source, and Destination reports.
The following fields provide additional information for Summary, which enables additional reports. You can decide if they are worth it. Listed from most interesting to least interesting overall, although that is partly a personal preference.
- AGENT
- Browser making the request. Slightly shorter than CS(USER-AGENT) which also works. This field will noticeably increase the size of the log file. It provides information for the Browser, Platform, Agent, and Web Robots reports.
- USER
- Authenticated user name entered into a name and password dialog when some portion of the site is restricted. Provides information for the Auth User report.
- TRANSFER_TIME
- time to send data in 1/60 secs. Much more accurate than TIME_TAKEN which is in seconds. This field provides information for the Connection Speed report.
- CS(HOST)
- The name of the server the user sent the request to. This field provides information for the Virtual Server report and can be useful in filtering virtual domains.
- METHOD
- The method from the request header, GET, PUT, etc. Same as CS-METHOD and slightly shorter than CS(METHOD). This field provides information for the Method report. Fairly technical.
- SEARCH_ARGS
- CGI arguments. Same as CS-URI-QUERY. This field provides information for the CGI Arguments report, which must also be enabled in the Summary configuration. The value of this report will depend on your use of CGI and plug-in arguments.
- CS(COOKIE)
- Any cookies sent by the browser. This field provides information for the Cookie report, which must also be enabled in the Summary configuration. Not used by most sites.
There are a few other fields that WebSTAR supports, which might be of some use to someone, but Summary doesn't use them:
- FROM
- almost always empty, used to be e-mail address of user but privacy concerns caused browsers to stop sending this field. Occasionally filled in by web robots.
- CONNECTION_ID
- The internal WebSTAR id number associated with this connection. I can't imagine ever using this.
- PATH_ARGS
- Portion of the request after a '$' character. This is a WebSTAR specific feature, designed to make programming CGI code easier but hardly ever used.
QuidProQuo format
QuidProQuo supports Common Log Format and QuidProQuo formats. There is an error in their Common Log Format that makes it not conform to the specification at least through version 2.1.2. Use this custom format string to parse QuidProQuo logs in Common Log Format:
HOST SKIP SKIP USER SKIP DATE-CLF FULL-REQUEST CODE BYTES EOL
QuidProQuo native format is very similar to WebSTAR format. See the WebSTAR format description above for more comments.
NCSA Common Log Format
This is a very common format, supported by many servers. Unfortunately it does not provide referrer, agent, transfer time, server name, or cookie information, which disables many reports.
NCSA Combined
This is a reasonably common format, supported by many servers. It does not provide transfer time, server name, or cookie information, which disables some reports.
If you have Apache or WebTen you can also add transfer time in seconds and optionally cookie to the end of the NCSA combined format. The Apache and WebTen command to get NCSA Combined logs is:
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\""
To add transfer time to the end:
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %T"
To add transfer time and cookie to the end (should all be on one line):
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %T \"%{Cookie}i\""
W3C Extended Log Format (ExLF)
This is a highly configurable format, but not all servers allow all of the options.
Summary requires the following tokens in the log file:
- DATE
- Date of request.
- CS-URI
- The requested item, essentially the same as CS-URI-STEM which also works.
The following fields are very highly recommended:
- TIME
- Time of request. Required for the Hourly, Time of Day, and Gaps in Service reports.
- C-IP
- Client IP address. Use along with C-DNS if you have DNS lookups turned on, as long as C-IP appears first. CS-IP and CS-HOST can be used together in that order instead but they are obsolete and not recommended. Required for the Top Level Domain, Domain, Host, Visits per Host, Hits per Visit, Pages per Visit, Bytes per Visit, Source, Destination, Path, and Reloads reports and the Unique Hosts and Visits columns in various reports.
- BYTES
- Bytes sent. Same as SC-BYTES. Required for the Bytes per Visit, Requests by Bytes, Peak Hours, Peak Days, by File Type, Transfer Size, Transfer Time, and Connection Speed reports and all of the Bytes related columns in various reports.
- SC-STATUS
- Result code. CS-STATUS will work but it is obsolete and is not recommended. Required for the Bad Links and Failed Requests reports and the Errors column in various reports.
- CS(REFERER)
- Site and page that referred them to us. This field will increase the size of log files substantially. Required for the Domains, Referrers, Search Words, Search Phrases, Full Referrers, New Referrers, Local Referrers, Source, and Destination reports.
The following fields provide additional information for Summary, which enables additional reports. You can decide if they are worth it. Listed from most interesting to least interesting overall, although that is partly a personal preference.
- CS(USER-AGENT)
- Browser making the request. This field will noticeably increase the size of the log file. It provides information for the Browser, Platform, Agent, and Web Robots reports.
- CS-USERNAME
- Authenticated user name entered into a name and password dialog when some portion of the site is restricted. Provides information for the Auth User report.
- TIME_TAKEN (WebSTAR) or TIME-TAKEN (Microsoft)
- time to send data. This field provides information for the Connection Speed report.
- CS(HOST)
- The name of the server the user sent the request to. You can also use S-IP or S-DNS instead if they are available. This field provides information for the Virtual Server report and can be useful in filtering virtual domains.
- CS-METHOD
- The method from the request header, GET, PUT, etc. Slightly shorter than CS(METHOD). This field provides information for the Method report. Fairly technical.
- CS-URI-QUERY
- CGI arguments. This field provides information for the CGI Arguments report, which must also be enabled in the Summary configuration. The value of this report will depend on your use of CGI and plug-in arguments.
- CS(COOKIE)
- Any cookies sent by the browser. This field provides information for the Cookie report, which must also be enabled in the Summary configuration. Not used by most sites.
There are many fields that ExLF supports, which might be of some use to someone, but Summary doesn't use them. Here are a few of them:
- CS-FROM
- almost always empty, used to be e-mail address of user but privacy concerns caused browsers to stop sending this field. Occasionally filled in by web robots.
- CS-VERSION
- The HTTP protocol version number.
- CS-BYTES
- The number of bytes in the request sent by the client.
Microsoft IIS
IIS Version 3.0 supports Common Log Format and Microsoft Extended Format. Neither one provides referrer or agent information. Summary supports FlashLog Format, from Maximized Software (http://www.maximized.com/products/flashstats/flashlog.htm) which adds the referrer and agent information to the end of the line. Summary also supports the WebTrends modified format (www.webtrends.com) if cookie logging is enabled.
IIS Version 4.0 supports Common Log Format, Microsoft Extended Format, and W3C Extended Format (ExLF). We recommend using the W3C ExLF format since that allows you to customize the tokens appearing in the log file. See the discussion of ExLF for more information about the various tokens.
User Specified Log Formats
If your server produces logs with a format not listed here, you may be able to configure Summary to read the log file by specifying the format manualy. You specify your log format by making a string with the following tokens corresponding to the order of the fields in your log file:
- DATE-CLF
- Full date/time in Common Log File format
- DATE-DMY
- Day, month, year
- DATE-MDY
- Month, day, year
- DATE-YMD
- Year, month, day
- TIME-24
- Hour, minute, second
- TIME-12
- Hour, minute, second, AM/PM
- YEAR
- Two or four digit year number
- MONTH
- One or two digit month number
- MONTH-NAME
- Three character month name
- DAY
- One or two digit day of the month
- HOUR
- One or two digit hour of the day
- MINUTE
- One or two digit minute of the hour
- SECOND
- One or two digit second of the minute
- FULL-REQUEST
- The original request line
- HOST
- Host name
- URI
- The requested resource, with optional '?' portion
- URI-QUERY
- The '?' portion of the request
- STATUS
- Three digit HTTP response code
- WEBSTAR-RESULT
- WebSTAR four character response code
- BYTES
- Number of bytes transferred
- TRAN-TIME-SECS
- The transfer time in seconds
- TRAN-TIME-TICKS
- The transfer time in 1/60ths of a second
- TRAN-TIME-MILLI
- The transfer time in milliseconds
- TRAN-TIME-HMS
- The transfer time in HH:MM:SS, or 1/60ths
- REFERER
- The referer from HTTP header
- AGENT
- The agent from HTTP header
- USER
- The user name from authorization
- METHOD
- The HTTP request method
- SERVER
- The server name, often from HTTP host field
- COOKIE
- The HTTP cookie field
- SKIP
- Skip to the next field
- IF-EOL
- If at EOL, return valid entry, else continue
- MUST-EOL
- Must be at the end of the line
- EOL
- Skip to the end of the line, must be last if used
- W3SVC
- Must match "W3SVC"
- FIXUP
- Fix FlashLog parsing to WebTrends layout
- CHAR
- Skip one character of input
Summary will automatically parse the log file into tokens, handle quoted strings, and find fields separated by a space, comma and a space, or tabs.
For example NCSA Common Log Format would be specified with the string:
HOST SKIP USER DATE-CLF FULL-REQUEST CODE BYTES EOL
|